1,839 research outputs found

    Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information

    Get PDF
    [[abstract]]The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In the paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as the HMM model. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained.

    Geometric Learning of Hidden Markov Models via a Method of Moments Algorithm

    Full text link
    We present a novel algorithm for learning the parameters of hidden Markov models (HMMs) in a geometric setting where the observations take values in Riemannian manifolds. In particular, we elevate a recent second-order method of moments algorithm that incorporates non-consecutive correlations to a more general setting where observations take place in a Riemannian symmetric space of non-positive curvature and the observation likelihoods are Riemannian Gaussians. The resulting algorithm decouples into a Riemannian Gaussian mixture model estimation algorithm followed by a sequence of convex optimization procedures. We demonstrate through examples that the learner can result in significantly improved speed and numerical accuracy compared to existing learners

    A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents

    Get PDF
    In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained
    corecore